Performance Evaluation of Some Clustering Algorithms and Validity Indices
نویسندگان
چکیده
In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn’s index, Calinski-Harabasz index, and a recently developed index I . Based on a relation between the index I and the Dunn’s index, a lower bound of the value of the former is theoretically estimated in order to get unique hard K-partition when the data set has distinct substructures. The effectiveness of the different validity indices and clustering methods in automatically evolving the appropriate number of clusters is demonstrated experimentally for both artificial and real-life data sets with the number of clusters varying from two to ten. Once the appropriate number of clusters is determined, the SA-based clustering technique is used for proper partitioning of the data into the said number of clusters.
منابع مشابه
Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملClustering Validity Assessment: Finding the Optimal Partitioning of a Data Set
Clustering is a mostly unsupervised procedure and the majority of the clustering algorithms depend on certain assumptions in order to define the subgroups present in a data set. As a consequence, in most applications the resulting clustering scheme requires some sort of evaluation as regards its validity. In this paper we present a clustering validity procedure, which evaluates the results of c...
متن کاملQuantitative Evaluation of Performance and Validity Indices for Clustering the Web Navigational Sessions
Clustering techniques are widely used in “Web Usage Mining” to capture similar interests and trends among users accessing a Web site. For this purpose, web access logs generated at a particular web site are preprocessed to discover the user navigational sessions. Clustering techniques are then applied to group the user session data into user session clusters, where intercluster similarities are...
متن کاملانتخاب اعضای ترکیب در خوشهبندی ترکیبی با استفاده از رأیگیری
Clustering is the process of division of a dataset into subsets that are called clusters, so that objects within a cluster are similar to each other and different from objects of the other clusters. So far, a lot of algorithms in different approaches have been created for the clustering. An effective choice (can combine) two or more of these algorithms for solving the clustering problem. Ensemb...
متن کاملCanonical PSO Based K-Means Clustering Approach for Real Datasets
"Clustering" the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Pattern Anal. Mach. Intell.
دوره 24 شماره
صفحات -
تاریخ انتشار 2002